Improved Topic Classification Model Using K-norm Base

نویسندگان

  • Xiang Li
  • Ea-Ee Jan
  • Cheng Wu
چکیده

Maximum Entropy (MaxEnt) model has been proven to be a very effective approach in the topic classification task, where a specific topic from a pre-defined topic set will be assigned to each sentence. Although it is originally developed based on the motivation of maximizing the conditional probability entropy under certain constraints, MaxEnt model is indeed an exponential distribution model that maximizes the log-likelihood of the training data. This log-likelihood criterion bears similarity with the classification accuracy criterion, which is the ultimate performance measure of a topic classifier. But these two criterion still differ from each other, and their discrepancy consequently reduces the benefit of optimization in improving classification accuracy. In this paper we propose to use different objective functions, which are closer to the classification accuracy criterion, to replace the log-likelihood objective used in the MaxEnt model estimation process. Specifically, we propose a Summation-Log K-norm objective and a Summation K-norm objective. Our experiments conducted on two large volume topic classification dataset prove the effectiveness of our new objectives in improving topic classification performance on top of the state-of-art MaxEnt model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

Critical Drag Investigation for an Axisymmetric Projectile with Choked Base Bleed at High-Subsonic and Transonic Regime Using SST K-? Model

In the following paper, the effects of a choked jet exhausted from the base of a non-lifting body on its total and base drags at sub-sonic and transonic regimes has been numerically investigated. Having surveyed the results of some turbulence models and after comparing with experimental results, an appropriate turbulence model i.e. SST K-?, has been chosen and this model has been used in the su...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006